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Abstract 

In this paper, we present two low complexity algorithms that achieve capacity for the noiseless (d, k) 
constrained channel when k = 2d + 1, or when k — d + 1 is not prime. The first algorithm, symbol sliding, 
is a generalized version of the bit flipping algorithm introduced by Aviran et al [lj. In addition to achieving 
| capacity for (d, 2d + 1) constraints, it comes close to capacity in other cases. The second algorithm is based on 

interleaving, and is a generalized version of the bit stuffing algorithm introduced by Bender and Wolf |2|. This 
method uses fewer than k — d biased bit streams to achieve capacity for (d, k) constraints with k — d + 1 not 
prime. In particular, the encoder for (d,d + 2 m — 1) constraints, 1 < m < oo, requires only m biased bit streams. 
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Index Terms 

Bit stuffing, Bit flipping, (d, k) constrained sequences, Shannon capacity. 

I. Introduction 



A binary sequence is said to be (d, k) constrained if successive ones are separated by at least d and 
^ ! at most k consecutive zeros. There is a long history of the use of (d, k) constrained codes and they 
q \ are part of virtually all magnetic and optical disk recording systems today. The d constraint is used to 
O 1 regulate intersymbol interference and the k constraint is important for timing recovery. Over the years, 
gains in storage density, manufacturing tolerances and system margins have been possible with the use 
! of (d, k) codes (see Q for an overview). 

The basic issues in coding for a constrained channel are rate and complexity. With the assumption 
^ ■ of a noiseless (d, k) constrained channel, the Shannon capacity, C(d, k), is given by 

1! --JSU-..-*... 

c3 ' characteristic polynomial of the (d, k) constraint, given by 

tt T,j=d+iZ~ j when k<oo 

Hd ' k[Z) ~ \ z- 1 + when k = oo U) 

C(d,k) is an upper bound on the information rate, R(d,k), of any encoding algorithm. The encoder 
efficiency E(d, k) = R(d, k)/C(d, k) measures how close the code is to capacity. Clearly, the challenge 
is to design low complexity codes with high efficiency. Of particular interest are optimal codes/algorithms 
that are 100% efficient. 

Our aim in this work is to improve upon techniques that use very simple encoding ideas. In this regard, 
Bender and Wolf first proposed the bit stuffing algorithm to generate (d, k) constrained sequences. 
They showed that controlled insertion of bits into an appropriately biased, independent and identically 
distributed (i.i.d) bit stream, is asymptotically optimal for the (d, oo) and (d, d + 1) constraints and 
near-optimal for other constraints. More recently, the bit flipping algorithm Q was shown to improve 

^This work was supported by Seagate Research. 

'Sometimes the notation A^fc is used for emphasis when the constraint (d, k) is not already clear from context. 
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bit stuffing rates for most (d, k) constraints and additionally achieve (2, 4) capacity. For all values of 
(d, k), k d + 1, k oo and (d, k)^(2, 4), bit flipping was shown to be suboptimal. 

As a first step, both the bit stuffing and bit flipping algorithms use a distribution transformer (DT) to 
introduce a bias into the unbiased (Pr{0} = 0.5) binary, i.i.d input stream. This has the effect of better 
conforming the input to the constraint before the actual bit insertion is performed. Wolf (T) observed 
that with the use of multiple such DTs, one could, in theory, generate enough degrees of freedom to 
produce optimal (d, k) sequences for all < d < k. More precisely, k — d DTs were shown to be 
sufficient for any given (d, k) constraint, k < oo. 

In this work, we introduce two code constructions that improve upon the aforementioned encoding 
algorithms. Our first construction is the symbol sliding algorithm which improves bit stuffing and bit 
flipping rates while still using only a single DT. We prove the optimality of the proposed algorithm 
for all (d, k) constraints with k — 2d + 1, and show that bit stuffing and bit flipping can be derived as 
special cases of symbol sliding. Our second construction is based on interleaving and uses fewer than 
k — d DTs to achieve capacity for all (d, k) constraints with k — d + 1 not prime. In the particular case 
of (d, d + 2 m — 1) constraints, our construction requires only m = log 2 (k — d + 1) DTs. 

The remainder of this paper is organized as follows. We begin by reviewing the bit stuffing and bit 
flipping algorithms in Section UH We provide an interpretation of matching phrase probabilities to those 
of the maxentropic sequence and motivate symbol sliding using the example of the (1, 3) constraint. Next, 
in Section |InJ we study the symbol sliding algorithm and prove its optimality for (d, 2d + 1) constraints. 
We then proceed to discuss code constructions using interleaving in Section |IV] and conclude in Section 

m 

II. Background: Bit Stuffing and Bit Flipping 

Both our code constructions are inspired by the simple concept of stuffing bits to satisfy (d, k) con- 
straints. In order to gain the necessary understanding and motivation behind our proposed constructions, 
we first review the bit stuffing algorithm. 

A. The Bit Stuffing Algorithm 

Bit stuffing H is a simple, but surprisingly efficient, algorithm to generate (d, k) sequences. The 
block diagram of the bit stuffing encoder is shown in Fig. [T] It consists of a distribution transformer 
(DT) followed by a bit stuff er. The DT converts the unbiased (Pr{0} = \), binary, i.i.d input stream 
into a p-biased (Pr{0} = p), binary, i.i.d stream. This conversion occurs at an asymptotic rate penalty 
of h(p) information bits, where h(.) is the binary entropy function. However, with a suitable choice of 
p, the biasing can actually improve overall rates by better fitting input data to the constraint. 

The p-biased stream generated by the DT is then fed into the bit stuffer, which sequentially performs 
the following two operations 

1) Insert a one after every run of k — d consecutive zeros (skip this step if k = oo) 

2) Stuff d zeros after every one 

The first operation produces a (0, k — d) constrained sequence, which then acts as input for the second 
operation. Stuffing d zeros in the second operation translates the (0, k — d) constraint to the required 
(d, k) constraint. Both these operations are invertible. Hence, with a one-to-one implementation of the 
DT (see for a possible method), the bit stuffing decoder is a simple inverse of the encoder. 

Bender and Wolf [2| showed that with a proper choice of bias p, the maximum average rate of the bit 
stuffing algorithm equals (<i, k) capacity for k = d + 1 and k = oo, and is strictly less than capacity for 
all other cases. We now provide an alternate interpretation of their results. This is based on matching 
phrase probabilities and will help motivate the need for our proposed algorithm in Section |In| 

Consider the finite state transition diagram (FSTD) of a (d, k) constraint, as shown in Fig. |2| Walks 
on the FSTD can be used to generate all possible (d, k) sequences. It is well known that there is a 
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Fig. 1 

Block diagram of the bit stuffing encoder. 

dt(p) denotes a distribution transformer 
with bias p. l in denotes the average message 
word length at the input to the bit stuffer, 

and l° ut denotes the average output word 

LENGTH. 




Fig. 2 

FSTD WITH MAXENTROPIC STATE TRANSITION 
PROBABILITIES IN PARENTHESES. THE LABELS ON 
DIRECTED EDGES INDICATE THE OUTPUT BIT. 



maxentropic walk, where edges must be traversed according to a set of optimal state transitions in order 
to achieve the highest possible rate. A code achieves capacity if and only if it produces a walk on the 
FSTD with the maxentropic state transition probabilities (shown in parentheses in Fig. El- 

Alternatively, one can describe a (d, k) sequence by the concatenation of independent phrases from 
the finite set X = jo fe l, O^" 1 1 , . . . , O^ -1 ]., d l j, where 0*1 represents a sequence of t zeros followed by 
a one. Each phrase corresponds to a cycle on the FSTD (see Fig. El) that begins and ends in State 0. 
Note if k — oo, then X = {o,o d i} and the FSTD in Fig. El can be redrawn with exactly d + 1 states. 
A code achieves capacity if and only if it generates constrained phrases with maxentropic probabilities. 
We can hence form a maxentropic phrase probability vector, A, which is given by @ 



A 



^-(fe+l) ^-(k) _ _ _ ^-(d+2) ^-(d+1) 



(2) 



where A * denotes the maxentropic probability of occurrence of a (d, k) constrained phrase of length t, 
namely 0* -1 l. With the bit stuffing algorithm, we can form the corresponding phrase probability vector 



11° 11° v° v° 



(3) 



where v® denotes the probability of occurrence of the phrase fc ~ l l. Table [fl lists the output (d, k) 
constrained phrases and corresponding message words at the input to the bit stuffer (see Fig.[T]). Recall 
that the bit stuffer input is p-biased, thereby yielding the corresponding phrase probabilities vf. 



TABLE I 

Bit Stuffing Phrase Probabilities 



Index 

(i) 


(d, k) constrained 
phrase 


Corresponding 
message word 


Phrase probability 





fe l 


Q{k-d) 


p (k-d) 


1 




Q(k-d-l) 1 


p C=-d-i)(l_p) 










t 


Q(fc-t) 1 




p(fe-d-*)(l_p) 










k-d-1 




01 


p(l -p) 


k-d 


d l 


1 


1-p 



Hence, the bit stuffing algorithm achieves capacity if and only if v° = A. It can be verified that for 
(d, d + 1) and (d, oo) constraints, v° exactly matches A with p = X d ^i^ and p = A^, respectively. 
The following proposition restates a result of Bender and Wolf 0. 

Proposition 1: For d > 0, d + 2 < k < oo, v°^A 
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Proposition 1 implies that the maximum average bit stuffing rate is strictly less than capacity for 
d + 2 < k < oo. Our objective now is to improve bit stuffing rates for d + 2 < k < oo while maintaining 
similar implementation complexity. We will show that this can be done by switching the bit stuffing 
phrase probabilities to better match the maxentropic vector A. As a first step, we show how this idea 
leads to the bit flipping algorithm and then generalize to symbol sliding in Section |ni| 

B. The Bit Flipping Algorithm 

Consider a DT bias of p greater than 0.5. This means that a is more likely than a 1 at the input 
to the bit stuffer (see Fig. [T]). Recall that our goal is to match the phrase probability vector, v°, to the 
maxentropic vector A. Looking at indices i = and i — 1 in Table ID we note that Vq = > = 

p( fe-d-1 )(l — p), but the corresponding maxentropic probabilities are related as A~( fc+1 ) < \~( k \ This 
suggests that switching the roles of v® and v® should result in a better match with A, thereby improving 
bit stuffing rates. Hence, we would like to replace the bit stuffer in Fig. [l] by a constrained encoder that 
sequentially performs the following three operations on the biased bit stream 

1) Track the run-length (p) of consecutive zeros, including the current bit (skip this step if k = oo) 

• If current bit is zero and p = k — d — 1, flip the next bit, reset p and goto 1) 

• If current bit is one and p < k — d — 1, reset p and goto 1) 

2) Insert a one after every run of k — d consecutive zeros (skip this step if k = oo) 

3) Stuff d zeros after every one 

The first operation performs the bit flipping (change ones to zeros and vice versa), which switches the 
roles of Vq and v®. The second and third operations are identical to the bit stuffer operations described in 
Section Hi- Al Since the bit flipping operation is invertible, the decoder once again is simply the encoder 
components arranged in the reverse order. 

The algorithm described above is precisely the bit flipping algorithm proposed by Aviran et al HI. 
Their main results are summarized in the following two propositions 

Proposition 2: For d>l, d + 2<k<oo, the bit flipping algorithm achieves greater maximum average 
rate than the bit stuffing algorithm. 

Proposition 3: For d > 0, d + 2 < k < oo, the bit flipping algorithm is optimal if and only if d = 2 
and k = 4. 

Proposition 2 mainly depends on the following two facts 

• The average bit flipping rate is greater than the average bit stuffing rate for all values of bias p 
greater than 0.5 

• The rate maximizing bit stuffing bias is greater than 0.5 for all<i>2,<i + 2<&;<oG 
Proposition 3 states that the new phrase vector, say v 1 , formed by swapping the roles of Vq and v® in 
v°, exactly matches A only for the (2,4) constraint. As will be seen later, this optimality of the bit 
flipping algorithm is possible only because of the binary capacity equality C(2,4) = C(l,2). 

C. Motivating Example: The (1,3) Constraint 

Thus far, we have seen a phrase probability interpretation of bit stuffing and how switching two 
entries of the phrase probability vector v° improved rates with the bit flipping algorithm. This prompts 
us to generalize the idea of switching phrase probabilities to better match the maxentropic vector A. 
The following example of the (1,3) constraint motivates this idea. 

Consider the phrase probabilities listed in Table ITTI From Proposition 1, it follows that the maximum 
average bit stuffing rate is strictly less than (1,3) capacity. Proposition 3 states that (1,3) bit flipping 
rates are also suboptimal. Now consider the phrase probabilities vf as listed in the last column of Table 
ITTI We call this symbol sliding with index 2. This means that the role of is slid down to that of v° 
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TABLE II 

Phrase probabilities for the (1,3) constraint 



Index (i) 


(1,3) constrained 


Maxentropic 


Bit stuffing 


Bit flipping 


Symbol sliding with index 2 




phrase 


prob. (A(i)) 




(vj) 


(vf) 





a l 


A" 4 




p(l -p) 


p{\ -p) 


1 


2 1 




p(l -p) 


p 2 


l-p 


2 


O 1 ! 


x-' 2 


l-p 


l-p 


P 2 



(index 2) with and v® being pushed up an index each, thus yielding the phrase probability vector 
v 2 = [vq vf uf]. It can be seen that with a bias of p = A" 1 , v 2 exactly matches A, and the average 
rate is equal to the (1,3) capacity. Hence, symbol sliding with index 2 achieves capacity for the (1,3) 
constraint where both bit stuffing and bit flipping fall short. This prompted us to study symbol sliding 
in greater depth. 

III. Construction 1: The Symbol Sliding Algorithm 

The main idea behind symbol sliding is to switch the roles of bit stuffing phrase probabilities so as 
to better match the phrase probability vector to the maxentropic vector A. Symbol sliding is hence a 
function of a sliding index, j, < j < k — d, for a given (d, k) constraint. Symbol sliding with index 
j involves sliding down Vq from index i — to i = j and moving each of • • • , up an index 

each, to yield the phrase probability vector v J = [v J v{ ... vl_ d ). Table Hill provides the full list of bit 
stuffing, bit flipping, symbol sliding and maxentropic phrase probabilities. 

The symbol sliding encoder is shown in Fig. It has a similar set up to the bit stuffing encoder with 
the bit stuffer being replaced by a constrained encoder that sequentially performs the following two 
operations on the biased bit stream 

1) Track the run-length (p) of consecutive zeros, including the current bit (skip this step if k = oo) 

• If current bit is zero and p = k — d, replace the run of k — d zeros with the phrase k ~ d ~^l, 
reset p and goto 1) 

• If current bit is one and k — d — j<p<k — d — 1, insert a zero before the current bit, reset 
p and goto 1) 

• If current bit is one and p < k — d — j, reset p and goto 1) 

2) Stuff d zeros after every one 

The first operation produces a (0, k — d) constrained sequence with the appropriate phrase matching and 
the second operation translates this to a (d, k) constraint by stuffing d zeros. The latter is identical to 
the corresponding bit stuffing operation. 
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Fig. 3 

Block diagram of the symbol sliding encoder. L in denotes the average message word length at the input to 

THE CONSTRAINED ENCODER. L J out DENOTES THE AVERAGE OUTPUT WORD LENGTH FOR SLIDING INDEX j. 

The constrained decoder is a simple inverse of the constrained encoder. It sequentially performs the 
following operations on the (d, k) sequence 

1) Remove the d stuffed zeros after every one 

2) Track the run-length (p) of consecutive zeros, including the current bit (skip this step if k = oo) 
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• If current bit is one and k — d — j + 1 < p < k — d, remove the stuffed zero before the current 
bit, reset p and goto 2) 

• If current bit is one and p = k — d — j, replace the phrase k ~ d ~^l by a run of k — d zeros, 
reset p and goto 2) 

• If current bit is one and p < k — d — j, reset p and goto 2) 

Let us denote by SS(j), the symbol sliding algorithm with index j. It can be seen from Table ITTT1 
that SS(0) and SS(1) are identical to the bit stuffing and bit flipping algorithms, respectively. Hence, 
bit stuffing and bit flipping are two special cases of the symbol sliding algorithm. We now summarize 
some important properties and prove the optimality of symbol sliding for certain constraints. 

A. Properties of Symbol Sliding 

Lemma 1: Let < d < k < oo. Then, the maximum average rate achieved by SS(j) equals (d, k) 
capacity when k = 2d + 1 and sliding index j = k — d = d + 1. 

Proof: We will show that SS(j) generates maxentropic (d, k) constrained phrases when k = 2d+ 1 
and j — d + 1. We start with a result of Ashley and Siegel Q, which states that the capacity of the 
(d, 2d + 1) constraint is identical to that of the (d + 1, oo) constraint. Hence A^d+i is the positive, real 
root of each of the following two characteristic equations 

2d+2 
l=d+l 

z-i + = i (4 ) 

Now, let the sliding index j — k — d — d + 1. Consider a bias p = X d ld+i- Then, we have 

= P k ~ d = p d+l = (5) 

d+l -, _ -I \-l _ \~( d + 2 ) (£\ 

v k-d-l — 1 P — 1 A d,2d+l — A d,2d+l \°) 

< + - l d . l =p'-\l-p) = \- d ^ l \ 2<i<k-d (7) 

where © follows from ©. Hence, we have vf +1 = A^2d+i +1 \ for all < i < k — d, whereby v d+1 = A. 
This proves the lemma. ■ 

Theorem 1: For < d < k, the maximum average rate achieved by SS(j) equals the (d, k) capacity 
only in the following cases 

1) j = 0, jfe = d+ 1 

2) j = l,k = d+l 

3) j = 1, d = 2, k = 4 

4) j = k - d, jfe = 2d + 1 

5) j > o, k = oo 

For all other values of (d, k), the maximum average rate of SS(j) is strictly less than capacity for each 

j,0<j<k-d. 

Proof: We wish to find constraints (d, k) for which v- 7 = A for some < j < k — d. We first 
note that when there is no k constraint, i.e., k = oo, then the symbol sliding operations reduce to 
simply stuffing d zeros after every one in the biased bit stream. This is identical to the corresponding 
bit stuffing operation, which has been shown to achieve capacity for (d, oo) constraints 0. Case 5) in 
the theorem statement now follows. In the remainder of this proof, we focus only on (d, k) constraints 
with k < oo. 

Depending on the value of j, we have the following four cases. 



7 



Case 1: j = 

This is identical to the bit stuffing algorithm. Let us first consider k > d + 1. For any such given 
(d, k) constraint, the following must hold (see Table UTTb in order for v° = A. 

P = A" 1 (8) 
1 - p = A~( d+1 ) (9) 

p k ~ d = A~ (fe+1) (10) 

® and @ together imply that A~ 1 +A~^ d+1 ' ) = 1. However, this means that A is a root of the characteristic 
(d, oo) equation, H d oo = 1. Hence, (HJ and © cannot be simultaneously satisfied for any finite k > d+1. 
This leads us to Proposition 1 which was stated without proof in Section ITT1 

Next, we look at k — d+ 1. In this case, we only have two possible phrases corresponding to indices 
% — 0, 1 in Table UTTI It can be seen that a bias of p = \~( d+2 ^ is optimal. This yields Case 1) of the 
theorem statement. 

TABLE III 

Maxentropic phrase probabilities along with those of the bit stuffing, bit flipping and symbol sliding 

algorithms 



Index (i) 


(d, k) constrained 
phrase 


Maxentropic 
prob. (Ad,k(i)) 


Bit stuffing 

(«<) 


Bit flipping 


Symbol sliding with index j 





fc l 


x -(fc+i) 

A d,k 


p (k-d) 


p(fc-<i-l)(l_p) 


p(fc-d-l)(l_p) 


1 


(*-i)i 


A d.k 




p(fe-d) 


p(fe-d-2)(!_p) 














i-i 


Q( fe -i+ 1 )l 


,-(fe-i+2) 
A d.k 


p (fc_« J _j+l)^ 1 _ JJ j 


p( fe - d -J +1 )(l-p) 


p(fc-d-i)(l _p) 


j 


0( fc -i)l 


.-(k-j+1) 
A d,k 


p(*-«i-j)(l _p) 


p(fc-d-i) (1 _ p) 


p(fc-<0 


i + i 




\ — (k—j) 
A d.k 


p(*-«*-J-l)(l_p) 


p( fc - d -J- 1 )(l -p) 


p(fc-d-j-l)(l _p) 














k-d-l 


Q( d +i)i 


(d+2) 
A d,k 


p(l -p) 


p(l -p) 


p(l -p) 


k-d 


rf l 


X -(d+l) 


1-p 


1-p 


1-p 



Case 2: j = 1 

This is identical to the bit flipping algorithm. We first consider k > d + 2. For any such given (d, k) 
constraint, the following must hold (see Table Hn|) in order for v 1 = A. 

p = A" 1 (11) 

l-p = A~ (d+1) (12) 

p k ~ d = \- k (13) 

p k - d -\l -p) = A~ (fc+1) (14) 

(fTTTl and (fl"2l together imply that A -1 + A~( d+1 ) = 1. As in the previous case, this is impossible unless 

k = oo. 

Next, let k = d+2. As before, from Table IITT1 we obtain the following conditions in order for v 1 = A. 

l-p = \~( d + 1 ) (15) 

p 2 = (16) 

p(l-p) = A~ (d+3) (17) 
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From (fT6l) we have p = A~ ( 2 +1 ). Using this and (fT31) in (IPTl) . we see that | + l + d+ l = d + 3or 
d = 2. This implies that SS(1) is optimal for the (2,4) constraint, as stated in Case 3) of the theorem. 

Finally, let k = d + 1. This means that we only have indices i = 0, 1 in Table Hill The bit flipping 
algorithm in this case is exactly the bit stuffing algorithm run on the corresponding flipped (ones changed 
to zeros and vice versa) biased bit stream. Hence, for any bit stuffing bias p, a bit flipping bias of (1 — p) 
achieves the same rate. This means that a bias of 1 — A~^ d+2 ^ = A^ d+1 ) is optimal for (d, d + 1) bit 
flipping, as stated in Case 2) of the theorem. 

We remark that the optimality of bit flipping for the (2, 4) constraint is possible only because of the 
binary capacity equality (7(2, 4) = (7(1, 2). The reason is as follows. We have seen that bit stuffing and 
bit flipping achieve capacity for (d, oo) and (d,d+ 1) constraints. In both these cases, there is exactly 
one state in the FSTD that has two outgoing branches. This implies that a single DT can provide the 
required degree of freedom, and is sufficient to generate maxentropic sequences. With d = 1, we can 
hence generate maxentropic (1, 2) sequences using either bit stuffing or bit flipping. We can transform 
a maxentropic (1,2) sequence to a maxentropic (2,4) sequence using the following two operations 

• Replace the sequence of phrases 1 10 1 1 with the (2,4) phrase 3 1 

• Replace the sequence of phrases 1 10 2 1 with the (2,4) phrase 4 1 

This is equivalent to saying that since (7(2, 4) = (7(1, 2), we have A 2 ,4 = Ai 2 and hence the maxentropic 
3 1 and 4 1 phrase probabilities can be written as, A^ = AjjAj -2 . and X^^ = A^jA^, respectively. 
Xi 2 Xil denotes the probability of concatenated (1,2) phrases 1 10 1 1, and AfjAfJ is the probability of 
concatenated (1,2) phrases 1 10 2 1. Note that there is no rate loss in the two operations. 

Case 3:2<j<k-d-l 

The above range of j implies that we are dealing only with constraints (d, k) for which k > d + 3. 
As in the previous two cases, we can derive the set of conditions from Table |TTT| 

p = X~ l (18) 
\-p = X- {d+l) (19) 
p k ~ d = X~ ik - J+1) (20) 

Once again, the above three conditions cannot be simultaneously satisfied unless k = oo. Hence, we 
conclude that SS(j), 2 < j < k — d — 1 cannot achieve capacity for any (d, k). 

Case 4: j = k — d and j > 2 

It was shown in Lemma [T] that j = k — d is optimal for k = 2d+ 1. We will now show that (d, 2d+ 1) 
are the only set of constraints for which SS(A; — d) is capacity achieving. From Table |TTT] we note that 
the following conditions need to be satisfied for SS(A; — d) to be optimal for any given (d, k). Recall 
that j > 2 and therefore k — d > 2. 

p = X~ l (21) 
1 - p = \~( d + 2 ) (22) 
pk -d = x -(d+i) (23) 

From (l23l and (|2~H above, we require that k—d = d+1 or k = 2d+l. It turns out (see Lemma[TJ that this 
value of k satisfies condition (l22l by virtue of the binary capacity equality C(d, 2d+ 1) = C(d+ 1, oo). 

The Theorem statement now follows from Cases 1 through 4 above. Note that in the process, we 
have also shown that for all constraints (d, k), k ^ d + 1, k ^ oo, k ^ 2d + 1 and (d, k)=£ (2,4), the 
maximum average rate of SS(j), V < j < fc - d is strictly less than capacity. ■ 

Theorem 2: Let < d < k < oo. Then for < j < k — d, the average rate of SS(j) is greater than 
the average rate of SS(j — 1) if and only if p > Xjl loo . 
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Proof: Let us denote by Rj(p,d,k) the average information rate of SS(j) for a given constraint 
(d, k) and bias p. We then have 

R j (p,d,k) = h(p)^, (24) 

where L in and L 3 out represent the average word lengths at the input and output to the SS(j) constrained 
encoder, respectively (see Fig. |3j). It can be seen that L in does not depend on the sliding index and is 
identical for all j, < j < k — d. Hence, for a given bias p, L 3 out is the important factor in comparing 
the information rates of SS(j) and SS(j — 1). It is given by 

k-d 

iiut = Y, v it (25) 

where l\ is the length of the codeword (or (d, k) constrained phrase) corresponding to the phrase 
probability v{ listed in Table UTTl For example, index % = j — 1 has = p k ~ d -i{\ — p) and Zj_ x = 
k — d — j + 1. Now, consider the difference of average output word lengths L{~1 — L{ ut . This is computed 
from (l25t to be 

LU - LL = P (k - d) - P {k ~ d - ] \l - P) (26) 

From (1241 and (I26T) . we can derive the condition, Rj(p, d, k) > Rj_i(p, d, k) if and only if p 3 + p > 1. 
The proof is now completed using the fact that the only real, positive root of p 3 ' + p = 1 is Aj_ l oo . ■ 

As a consequence of Theorem El we state that if the rate maximizing bias for SS(j — 1) is greater 
than Xjli oo, then SS(j) achieves a higher maximum information rate than SS(j — 1) for the given (d, k) 
constraint. 

Theorem 3: The average information rate of SS(j) is given by 



Rj(p,d,k) = h(p)- 



1 _ pk-d + (1 _ (pk-d-j _ jpk-d + ^ 

Proof: We start with (l24ll wherein 

Rj(p,d,k) = h(p)- L ' 



j in 



1° 

^out 

and write out the expressions for L in and L 3 out . L in is the average message word length into the 
constrained encoder of Fig. [3j Since it is independent of the sliding index j, we set j = and compute 
L in from Table HI It is given by 

k-d 

Lin = E v % (27) 

where is the length of the corresponding message word listed in Table HI For example, index i = 
k — d — 1 has v® = p(l — p) and U = 2. Writing this out, we obtain 

k—d 

L m = Y, v % (28) 

i=0 

= l-p + 2p(l-p) + 3p 2 (l-p) + ... + {k-d)p k - d -\l-p) + {k-d)p k - d (29) 
= l+p + p 2 +p 3 +p 4 + ...+p k ~ d ~ 1 (30) 



1 — p 

where ( 00l is a direct simplification of d2*9l . 



(3D 
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Similarly, we now write out the expression for L 3 out , the average codeword length at the output of the 
constrained encoder. Clearly, this is dependent on the sliding index j. We start with the expression in 
d25l) and write out the individual terms. 



k-d 



LLt = (32) 

i=0 

= (1 - p)(d + 1) + p(l - p)(d + 2) + p 2 (l - p)(d + 3) + . . . + p k - d - j -\l -p){k- j)(33) 
+p k ~ d (k - j + 1) + p k ~ d -i(l - p)(k - j + 2) + . . . + ^-^(l - p)(k + 1) 

Now let 

S = Ll t (34) 
= (l-p)(d + l)+p(l-p)(d + 2) + ...+p k ~ d ~ 1 (l-p)k + p k - d (k + l) (35) 
= d+l+p + p 2 +p 3 + ...+p fc ~ d (36) 

I „fe— d+l 

= ^ (37) 

1 — p 

where d35t follows from (l33l) with j = 0. Using (1331 . (l3"5t and d37l> . we get 

4 wi = (1 - P)(d + 1) + p(l - p)(d + 2) + p 2 (l - p)(d + 3) + . . . + p^^-^l - p)(k - j)(38) 
+p fc ~ d (A; — j + 1) + p fc ^'(l - p) (fc - j + 2) + . . . + p k ~ d -\l -p)(k + 1) 
= S - ]p k ~ d + p k ~ d ~ 3 - p k ~ d (39) 

(40) 

= d+ Jp k-d + p k-d-j _ p k-d (41) 



(42) 



1 — p 

1 _ p k~d + (1 _ p) ^fc-d-J _ ^pfc-d + 

1 — p 

Substituting (f3~Tb and d42l into (l24b . we obtain the expression for information rate as 

*i(p> d > k ) = h ^ \- pk - d + {l X)l k - d -i-w k ~ d + d) (43) 

■ 

In Theorem |3j we obtained an expression for the average information rate of SS(j) in terms of the 
bias p, sliding index j and constraint parameters d, k. For a given constraint (d, k), we are now interested 
in determining the values of p and j that jointly maximize Rj(p, d, k). However, the complexity of the 
rate expression in (l43l) makes further analysis difficult. For this reason, optimization for both p and j 
is done numerically. Rate improvements for some important constraints are summarized in Table |IVJ 



IV. Construction 2: Optimal Codes Using Interleaving 

Thus far, in Sections HT] and Hn] we have studied the bit stuffing, bit flipping algorithms and proposed 
the symbol sliding algorithm to generate (d, k) constrained sequences. All three of these constructions 
used a single DT to generate an appropriately biased, i.i.d bit stream, which was then encoded into 
constrained phrases. Recently, Wolf JJ| observed that with the use of multiple such DTs, optimal bit 
stuffing encoders could be constructed for all values of d and k. The idea is to generate several distinct 
biased streams, one each for a state in the FSTD that has two outgoing branches (see Fig. |2jl. Since 
the number of such states is A; — for k < oo, we need precisely that many DTs to construct optimal 
codes in this fashion. We will refer to this scheme as the multiple DT construction. 



II 



TABLE IV 

Simulation Results of Rate Improvements for some constraints 



(d, k) 


Shannon capacity 


Maximum bit 


Maximum bit 


Maximum symbol 


Maximizing symbol 




C(d,k) 


stuffing efficiency (%) 


flipping efficiency (%) 


sliding efficiency (%) 


sliding index j 


(1,3) 


0.5515 


98.93 


99.74 


100 


2 


(1,7) 


0.6793 


99.42 


99.79 


99.79 


1 


(2,5) 


0.4650 


98.47 


99.74 


100 


3 


(2,10) 


0.5418 


99.39 


99.70 


99.87 


2 


(3,6) 


0.3746 


98.23 


99.57 


99.89 


2 


(4,8) 


0.3432 


98.02 


99.16 


99.91 


4 


(5,9) 


0.2979 


97.82 


98.89 


99.77 


3 



In this section, we show that certain classes of (d, k) constraints allow optimal encoding using fewer 
than k — d DTs. This is derived from the factorization of characteristic (d, k) polynomials and can be 
implemented using interleaving. We first describe such a construction for (d, d + 2 m — 1) constraints, 
1 < m < oo, and then generalize to other constraints. 

A. Optimal (d, d + 2 m - 1) Codes, 1 < m < oo 

In order to understand the idea behind our code construction, we first briefly review the relationship 
between factorization of characteristic (d, k) polynomials and interleaving. Recall that the characteristic 
polynomial of the (d, k) constraint, k < oo, is given by <[TJ 

k+l 

H d , k (z) = £ z-i 

j=d+l 

From Z-transforms, we know that z~i indicates a delay of j time periods. For our use of z~\ j denotes 
phrase length in bits. Hence, the characteristic polynomial H djk (z) is really indicative that a (d, k) se- 
quence is the concatenation of independent phrases from the finite set X = |o d l, d+1 l, . . . , fc_1 l, fc l j. 

As before, if k = oo, then X = {o, d l}. 

Factorization of H djk (z) has the interpretation of interleaving phrases corresponding to the individual 
factors. For example, consider the characteristic polynomial of the (1,4) constraint, Hi^(z) = z~ 2 + 
z~ 3 + z~ 4 + z' 5 . This can be factored as H 1A (z) = (z" 1 + z~ 2 ) (z~ l + z~ 3 ) = H liQO (z)H 2j00 (z). The 
term (z^ 1 + z~ 2 ) represents a source that independently produces phrases of length one or two bits 
(or a source that produces a (1, oo) constrained stream). Similarly, + z~ 3 ) represents a source that 
independently produces phrases of length one or three bits (or a source that produces a (2, oo) constrained 
stream). Interleaving phrases from these two sources yields a sequence of independent, concatenated 
phrases of length two, three, four or five bits, which is in turn described by z~ 2 + z~ 3 + z~ 4 + z~ 5 , 
the characteristic (1,4) polynomial. This gives the equivalence between interleaving and factorization. 
Note that the interleaving is based on length of individual phrases and not their representations. 

Now consider the characteristic polynomial of the (d, d+2 m — 1) constraint, H d d+ 2 m -i(z) = Y^t^+i z _J - 
This can be factored as 

d+2 m 

H d , d+2 ^i(z) = J2 z ~ j ( 44 ) 
j=d+l 

m 

= ^ +i )n(i+^" 2(i "') (45) 

1=1 
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-(d-m+1) 



~[H 2 i-i j00 (z) 



(46) 



(I46T) shows that H dd+2 ™_i(z) can be written as the product of m characteristic (d, k) polynomials, each 
with k = oo and some d > 0. The term z -( d - m + 1 ) U p front in (I46T) merely acts as additional "delay" 
(or phrase length). Our code constructions are applicable even when {d — m + 1) < in (l46l) . 

It is known from a result of Bender and Wolf [2|, that the bit stuffing algorithm is optimal for all 
(d, oo) constraints, d > 0. Recall that bit stuffing uses just a single DT. Hence, optimal codes can 
be constructed for (d,d + 2 m — 1) constraints using exactly m DTs, one each for factors H 2 i-i j00 (z), 
i = 0,1,..., m in (l46l) . and then suitably interleaving and encoding the biased streams. This is in 
comparison to the 2 m — 1 DTs that would be needed with the multiple DT construction. 



Binary stream 



i.i.d, unbiased 




(d, d + 2 m — 1) constrained 



Fig. 4 

Block diagram of THE (d, d + 2 m - 1) CODE construction by interleaving. A DENOTES the positive real root of 

H d ,d+2™-l(z) = 1. 



We now describe how the interleaving and encoding can be performed so that the codes produced 
are optimal. The block diagram in Fig. |4] outlines our construction. First, the input is split into m 
distinct streams using a serial to parallel (S/P) converter. These m streams then act as inputs to the m 
DTs. As before, DT(x) dentoes a distribution transformer that outputs a binary i.i.d stream with bias x 
(Pr {0} = x) in response to an unbiased, i.i.d, binary input stream. The bias of the m DTs are chosen 
so as to generate maxentropic (d, d + 2 m — 1) constrained phrases out of the encoder. It is known from a 
result of Zehavi and Wolf @ that the maxentropic phrase probabilities are A~* for a constrained phrase 
of length i. Hence, we work backwards to determine the bias of the m DTs, which turn out to be — r-j-, 
/ = 0,l,2,...,m- 1. 

The m biased bit streams now act as inputs to the bit interleaver. The bit interleaver produces a 
binary sequence u = (uiu 2 ■ ■ ■ u m ) E {0, l} m by interleaving the m biased streams one bit at a time, 
in the specified order (u\ is the MSB and u m the LSB). Finally, the encoder maps the binary sequence 
u of decimal value j to the (d, k) constrained phrase O^l (string of (d + j) zeros followed by a one), 
j = 0, 1, 2, . . . , 2 m — 1. Table |V] specifies such an encoder mapping for (d, d+7) constraints. The size 
of this table is 8 in the example and k — d + 1 = 2 m in general. 

The construction described above requires m DTs, one m-bit interleaver and one variable length 
encoder. Hence, the number of required DTs is log 2 (k — d+ 1), as opposed to k — d with the multiple 
DT construction. Next, we prove the optimality of our code construction. 

Theorem 4: The encoding procedure outlined in Fig. |4] constructs optimal (d, d + 2 m — 1) codes. 
Proof: In our construction, the bias of the m DTs were chosen so as to generate maxentropic 
(d, d + 2 m — 1) constrained phrases at the output. Hence, our codes are optimal by the maxentropic 
property. We provide a complete proof in the Appendix „ ■ 
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TABLE V 

Encoder Mapping for the (d, d + 7) constraint 



Interleaved binary 
sequence u=(«iU2M3) 


Corresponding (d, d + 7) 
constrained phrase 


000 


on 


001 


0( d +!)l 


010 


Q( d + 2 )l 


Oil 


0( d +3)l 


100 




101 


0( d +5)l 


110 


Q(d+e) l 


111 


Q(d+7) 1 



B. Other Constraints 

We now extend the interleaving construction proposed in Section IIV-AI to a wider class of (d, k) 
constraints. The idea is to derive appropriate factorizations for general characteristic (d, k) polynomials, 
k < oo. As before, we start with the characteristic polynomial 

H d . k {z)= Y, z ~ 3 ( 47 ) 
j=d+i 

The number of terms in the summation in ( 1771 is equal to k — d+1. Let k — d + 1 be factored into the 
product of primes as 

n 

k-d+l = Y[ p i ( 4g ) 
Now define rji = Y[)=i Pi, i — 1,2, ... ,n, with 770 = 1. It follows that Hd t k(z) can be factored as 

n 

H dik (z) = z-^l[Fl k (z), (49) 

8=1 

where each F^ k (z), % = 1, 2, . . . , n, is of the form 

F l dk (z) = 1 + z-^ + z' 211 '- 1 + ... + z-^-^i-i (50) 

Each factor F^ k (z) has Pi terms and can be realized using (Pj — 1) DTs. Hence, the total number 
of DTs required is Y^=i (P ~ !)• ^ s ^ on S as k — d + 1 is not prime, and the number of factors n is 
greater than one, this is strictly less than the k — d DTs required in the multiple DT construction. As 
an example, we now describe in detail our construction for the (0, 11) constraint. 

The characteristic (0, 11) polynomial can be factored as 

#0,11(2) = (51) 

= z- 1 (l + z- 1 ) (l + z- 2 ) (l + z- 4 + z~ s ) (52) 

Fig. |3 shows the code construction that uses 4 DTs, one 4-bit interleaver and one variable length 
encoder. The bias of the 4 DTs are determined exactly as in Section IIV-AI by working backwards 
from a maxentropic output. The DTs with bias and 1+ ^_ 2 correspond to factors (1 + z~ l ) and 

(1 + z~ 2 ), respectively. The remaining two DTs with bias 1+ ^_ 4 and 1+A -i +A - 8 both correspond to the 
factor (1 + z~ 4 + z~ 8 ). 
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The interleaver functionality is slightly more complex in this case. If U\ — 1, the interleaver generates 
a binary sequence u = (M1M2M3M4) by interleaving the 4 biased streams one bit at a time in the specified 
order (u\ is the MSB and u 4 the LSB). If u\ = 0, the interleaver skips the second biased stream (shown 
in dotted lines) and outputs the binary sequence u = [uiu^u^). The encoder then maps the binary 
sequence u to (0, 11) constrained phrases as specified in Table IVT1 The size of this table is 12 for this 
example and k — d + 1 in general. 




Fig. 5 

Block diagram of the (0, 11) code construction by interleaving. A denotes the positive real root of 

#0,11(2) = 1. 



TABLE VI 

Encoder mapping for the (0, 11) constraint 



Interleaved binary 


Corresponding (0, 11) 


sequence u 


constrained phrase 


000 


1 


001 


01 


010 


2 1 


011 


3 1 


1000 


4 1 


1001 


5 1 


1010 


6 1 


1011 


7 1 


1100 


8 1 


1101 


9 1 


1110 


10 1 


1111 


n l 



The code construction described above requires 4 DTs, as opposed to 11 DTs required with the 
multiple DT construction. The proof of optimality of the code construction in Fig. |5] proceeds similarly 
to that of Theorem |4] and is hence omitted in the interest of space. 

V. CONCLUSION 

We introduced two new code constructions for the (d, k) constraint. First, we proposed the symbol 
sliding algorithm, which improves bit stuffing and bit flipping rates, and additionally achieves capacity 
for (d, 2d+ 1) constraints. The main idea behind symbol sliding is to generate constrained phrases with 
probabilities that closely match that of the maxentropic sequence. We showed that this can be done by 
switching phrase probabilities from the bit stuffing algorithm. Furthermore, symbol sliding requires just 
one distribution transformer (DT), thus maintaining the simplicity of bit stuffing and bit flipping. 
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Our second construction was inspired by a recent generalization of bit stuffing proposed by Wolf 
iTfl . where k — d biased bit streams are used to construct optimal (d,k) sequences for all k < oo. 
Here, we observed that the factorization of certain characteristic (d, k) polynomials could be used to 
construct optimal codes with fewer than k — d DTs. This scheme was implemented using interleaving. 
In particular, we showed that optimal (d, d + 2 m — 1) codes, 1 < m < oo, could be constructed using 
just m DTs. 

We note that the optimality of the two constructions proposed in this work have different origins, 
eventhough their implementations are linked through the bit stuffing algorithm. The optimality of symbol 
sliding for (d, 2d+ 1) constraints is possible only because of the binary capacity equality C(d, 2d+ 1) = 
C(d+ 1, oo), and the fact that bit stuffing with a single DT achieves (d, oo) capacity. With our second 
construction based on interleaving, the proof of optimality lies in the factorization of characteristic (d, k) 
polynomials. Hence, with the two different origins of optimality, we believe that further improvements 
might be possible with a combination of symbol sliding and interleaving. 
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Appendix 

Proof of Theorem^ We will show that the average information rate of the code construction in Fig. 
|4] equals the capacity of the (d, d + 2 m — 1) constraint. The average information rate is given by 

m-l h ( - ) 

R(d,d + 2 m - 1) = ]T Vl+A ' 2 ' y (53) 

where L out = E^i{d+j)\- (d+j) is the average phrase length at the output of the encoder. The capacity 
of the (d, d + 2 m — 1) constraint can be expressed as 

C(d,d+2 m -l) = log 2 \ = J2 ( 54 ) 



3=1 



J out 
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Hence, we need to show that YT=v h (l^F^) = S?=i A~ (d+i) /o# 2 (\ {d+j) ). We will start with the 
R.H.S = Ef=i \- {d+j) log 2 (A^) and show that it is same as the L.H.S = E^ 1 h (rT 7 



EA"^)/o, 2 (A^)) = E^(l + A- 2 >% 2 A ^f 1 ; A (55) 

j=l i=0 lli=0 I 1 + A ' 



m— 1 m— 1 r)'t\— 2 1 

E l °92 (l + A- 21 ) + log 2 \ £ — — — (56) 

i=0 i=0 

m-1 i m-1 \-2* /i I \-2 4 \ 

m-1 / i 
i=0 V1 



+ A- 21 



where d55t follows from the substitution A = m ^w (J ''^^ (l56l) is a result of dividing out the 



second term in (1531) . and d57l is a regrouping of the terms in 



